Learning the Countability of English Nouns from Corpus Data

نویسندگان

  • Timothy Baldwin
  • Francis Bond
چکیده

This paper describes a method for learning the countability preferences of English nouns from raw text corpora. The method maps the corpus-attested lexico-syntactic properties of each noun onto a feature vector, and uses a suite of memory-based classifiers to predict membership in 4 countability classes. We were able to assign countability to English nouns with a precision of 94.6%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mass counts in World Englishes: A corpus linguistic study of noun countability in non-native varieties of English

Research on the morpho-syntax of non-native varieties of English has reported a widespread presence of mass noun pluralization such as baggages, equipments and softwares. In this paper we conducted a corpus linguistic study in order to provide empirically substantiated answers to this claim. We examined the purported prevalence of noun countability in World Englishes in a 1.9 billion-token mega...

متن کامل

Crosslingual Countability Classification: English meets Dutch

This paper presents a range of methods for classifying Dutch nouns as countable, uncountable or plural only based on both Dutch and English data. The classification is based on the occurrence of countability specific linguistic features that are extracted from unannotated corpora. We show that in the absence of reliable Dutch gold standard data, cross-linguistic classification can be achieved o...

متن کامل

Using an Ontology to Determine English Countability

In this paper we show to what degree the countability of English nouns is predictable from their semantics. We found that at 78% of nouns’ countability could be predicted using an ontology of 2,710 nodes. We also show how this predictability can be used to aid non-native speakers to determine the countability of English nouns when building a bilingual machine translation lexicon.

متن کامل

A Plethora of Methods for Learning English Countability

This paper compares a range of methods for classifying words based on linguistic diagnostics, focusing on the task of learning countabilities for English nouns. We propose two basic approaches to feature representation: distribution-based representation, which simply looks at the distribution of features in the corpus data, and agreement-based representation which analyses the level of tokenwis...

متن کامل

Detecting the Countability of English Compound Nouns Using Web-based Models

In this paper, we proposed an approach for detecting the countability of English compound nouns treating the web as a large corpus of words. We classified compound nouns into three classes: countable, uncountable, plural only. Our detecting algorithm is based on simple, viable n-gram models, whose parameters can be obtained using the WWW search engine Google. The detecting thresholds are optimi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003